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Abstract 

We discuss a model of heterogeneous, inductive rational agents inspired by the El 
Farol Bar problem and the Minority Game. As in markets, agents interact through 
a collective aggregate variable - which plays a role similar to price - whose value 
is fixed by all of them. Agents follow a simple reinforcement-learning dynamics 
where the reinforcement, for each of their available strategies, is related to the 
payoff delivered by that strategy. We derive the exact solution of the model in the 
"thermodynamic" limit of infinitely many agents using tools of statistical physics 
of disordered systems. Our results show that the impact of agents on the market 
price plays a key role: even though price has a weak dependence on the behavior of 
each individual agent, the collective behavior crucially depends on whether agents 
account for such dependence or not. Remarkably, if the adaptive behavior of agents 
accounts even "infinitesimally" for this dependence they can, in a whole range of 
parameters, reduce global fluctuations by a finite amount. Both global efficiency 
and individual utility improve with respect to a "price taker" behavior if agents 
account for their market impact. 
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1 Introduction 



The El Parol bar problem [1] has become a popular paradigm of complex 
systems. It describes the situation where N persons have to choose whether 
to go or not to a bar which is enjoyable only if it is not too crowded. In order 
to choose, each person forms mental schemes, hypotheses or behavioral rules 
based on her beliefs and she adopts the most successful one on the basis of past 
performance. Inductive [1], low [2] or generally bounded rationality based on 
learning theory [3] is regarded as a more realistic approach to the behavior of 
real agents in complex strategic situations [4]. This is specially true in contexts 
involving many heterogeneous agents with limited information, such as the El 
Parol bar problem. Theoretical advances, beyond numerical simulations, is 
technically very hard on these problems and it has been regarded as a major 
step forward in the understanding of complex systems [5]. 

The minority game [6,7] represents a first step in this direction. It indeed de- 
scribes a system of interacting agents with inductive rationality which face the 
problem of finding which of two alternatives shall be chosen by the minority. 
This problem is quite similar in nature to the El Parol bar problem as the 
result for each agent depends on what all other agents will do and there is 
no a priori best alternative. These same kind of situations arise generally in 
systems of many interacting adaptive agents, such as markets[7,8]. 

Numerical simulations by several authors [6,8-12] have shown that the minor- 
ity game (MG) displays a remarkably rich emergent collective behavior, which 
has been qualitatively understood to some extent by approximate schemes 
[7,13,14]. In this paper, which follows rcfs. [11,15], wc study a generalized mi- 
nority game and show that a full statistical characterization of its stationary 
state can be derived analytically in the "thermodynamic" limit of infinitely 
many agents. Our approach is based on tools and ideas of statistical physics 
of disordered systems [16]. 

The minority game, as the El Parol bar problem, allows for a relatively easy 
definition in words. This may be enough for setting up a computer code to run 
numerical simulations, but it is clearly insufficient for an analytical approach. 
Therefore we shall, in the next section, define carefully its mathematical for- 
mulation. We shall only discuss briefly its motivation, for which we refer the 
reader to refs. [6-8]. Even though the behavioral assumptions on which the 
MG is based may be questionable when applied to financial markets (see sect. 
2.4), still we find it convenient to consider and discuss the model as a toy 
model for a market, in line with refs. [7,8,17]. The relation to markets, at this 
level, may just be seen as a convenient language to discuss the results in simple 
terms. This choice refiects our taste and surely more work needs to be done 
to show the relation of the minority game with real financial markets. We be- 
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lieve, however, that because of the statistical nature of the collective behavior 

- which are usually quite robust with respect to microscopic changes - our 
results may be qualitatively representative of generic systems of agents inter- 
acting through a global quantity via a minority mechanism, such as markets. 
The minority game indeed captures the essential interaction between agents 
beliefs and market fluctuations - how individual beliefs, processing fluctua- 
tions, produce fluctuations in their turn. This interaction is usually shortcut 
in mathematical economy assuming market efficiency, i.e. that prices instanta- 
neously react to and incorporate agents beliefs. The motivation underlying the 
efficient market hypothesis, is, in few words that if there where inefficiencies 

- or arbitrage opportunities - that would be exploited by speculators in the 
market and washed out very quickly. Implicitly one is assuming that there is 
an infinite number of agents in the market who are using very sophisticated 
strategies which can detect, exploit and eliminate arbitrages very quickly. As 
"stylized" as it may be, the minority game allows to study how a finite num- 
ber of heterogeneous agents interact in a complex system such as a market. 
It allows to ask to what extent this "stylized" market is inefficient and how 
agents really exploit arbitrage opportunities and to what extent. 

After defining the stage game, we shall briefiy discuss its Nash equilibria: these 
are the reference equilibria of deductive rational agents. Finally we shall pass to 
the repeated game with adaptive agents which follow exponential learning. We 
show that the key difference between agents playing a Nash equilibrium and 
agents in the usual minority game is not that the ffrst are deductive whereas 
the latter are inductive. Rather the key issue is whether agents account for 
their "market impact" or not. By market impact we refer to the fact that the 
choice of each agent affects aggregate quantities, such as prices. In the minority 
game [7,8] agents behave as "price takers", i.e. as if their choices did not 
affect the aggregate. However, due to the minority nature of the interaction, 
the market impact reduces the "perceived" performance of strategies which 
agents use in the market with respect to those which which are not used and 
whose performance is monitored on the basis of a virtual trade (assuming the 
same price). In order to analyze in detail this issue, we generalize the MG and 
allow agents to assign an extra reward 77 to a strategy when it is played. This 
parameter allows agents to account for their market impact and it plays the 
same role as the Onsager reaction term, or cavity field, in spin glasses [16]. 

Our main results are: 

(1) We derive a continuum time limit for the dynamics of learning. 

(2) We show that this dynamics admits for a Lyapunov function, i.e. a func- 
tion of all relevant dynamical variables which decreases on all trajectories 
of the dynamics. This is a very important result since it turns the prob- 
lem of studying the stationary state of a stochastic dynamical system 
into that of characterizing the (local) minima of a function. Consider- 
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ing this function as an Hamiltonian, we can apply the tools of statistical 
mechanics to solve the problem. 

(3) When agents do not consider their impact on the market, as in the minor- 
ity game {r] = 0), i) the stationary state is unique, ii) the Lyapunov func- 
tion is a measure of the asymmetry of the market. In loose words, agents 
minimize market's predictability. Hi) When the number of agents exceeds 
a critical number the market becomes symmetric and unpredictable, with 
large fluctuations as first observed in [8]. 

(4) If agents know what is the dependence of the aggregate variable on their 
behavior they can consider their impact on the market. We refer to this as 
the full information case since agents have full information on how the ag- 
gregate would have changed for each of their choices. In this case, i) there 
are exponentially many stationary states, ii) these states are Nash equi- 
libria. Hi) the Lyapunov function measures market's fluctuations, which 
means that agents cooperate optimally in maximizing global wealth when 
maximizing their own utility. As a result, fluctuations always decreases 
as the number of agents increase. 

(5) This state is recovered when t] = 1. This means that agents need not have 
full information in order to reach this optimal state. It is enough that they 
over-reward the strategy they are currently playing with respect to those 
they are not playing, by a quantity t]. 

(6) Any rj > implies an improvement both in individual payoffs - as shown 
in sect. 8 and in global efficiency with respect to the t] = case. 

(7) The most striking result comes when asking how does the collective be- 
havior interpolates between the two quite different limits when changing 
1] from to 1. The result is that when there are few agents the change 
is mild and continuous - even though there is a phase transition, that 
is a continuous one (second order). When there are many agents the 
change happens suddenly and discontinuously as soon as r] > 0. Even 
an infinitesimal 7] is enough to reduce market's fiuctuations by a finite 
amount. 

These results suggests that the neglect of market impact - which seems an 
innocent approximation^ and is usually at the very basis of mathematical 
economy and finance^ - plays a very important role in complex systems such 
as markets. 



^ The impact of each agent on the aggregate is of relative order and it vanishes 
as — > oo. 

^ For example, in determining optimal investment strategies or pricing, it is cus- 
tomary to consider prices as just exogenous processes, independent of the trading 
strategy really adopted. 
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2 The stage game: strategic structure 

2.1 Actions and payoffs 

The minority game describes a situation where a large number N of agents 
have to make one of two opposite actions - such as e.g. "buy" or "sell" - and 
only those agents who choose the minority action are rewarded. This is similar 
to the El Parol bar problem, where each one of N agents may either choose 
to go or not to a bar which is enjoyable only when it is not too crowded. In 
order to model this situation, let jV = (1, . . . , A^) be the set of agents and let 
A — (—1, +1) be the set of the two possible actions. li ai E A is the action of 
agent i E N", the payoffs to agent i is given by 

Ui{ai,a_i) = -GiA where A = ^ ai, (1) 

where a_i = {aj, j ^ i] stands for opponents actions. The game rewards the 
minority group. To see this, note that the total payoff to agents X^, Ui — —A^ 
is always negative. Then the majority of agents, who have = sign A, receives 
a negative payoff —\A\, whereas the minority "wins" a payoff of \A\. Eq. (1) 
can be generalized to Ui{ai, a_i) = —a, U{A): if the function U{x) is such that 
xU{x) = —xU{—x) > for all x E IR, the game again rewards the minority. 
The original model[6,7] takes U{x) — signx, but the collective behavior is 
qualitatively the same [11] as that of the linear case U{x) — x on which we 
focus. Note that the "inversion" symmetry Ui{—ai, —a^i) = Ui{ai,a_i) implies 
that the two actions are a priori equivalent: there cannot be any best actions, 
because otherwise everybody would do that and loose. 

The key issue, clearly, is that of coordination. With respect to coordination 
games [18, chapt. 6], we remark that agents cannot communicate. If commu- 
nication were possible, agents would have incentives to stipulate contracts - 
such as "We toss a coin, if the outcome is head I do Ome = +1 and you do 
Oyou = ~1) and if it is tail we do the other way round". Both players would 
benefit from this contract because it transforms the negative sum game into a 
zero sum game for the two players. The contract would then be self-enforcing. 

Agents interact only through a global or aggregate quantity A which is pro- 
duced by all of them. This type of interaction is typical of market systems [7] 
ant it is similar to the long-range interaction assumed in mean-field models of 
statistical physics[16]. Finally note that the El Parol bar problem has a similar 
structure but with A replaced by {A — Aq) in Eq. (1) where Aq is related to 
the bar's comfort level [1,19]. 
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2.2 States and information 

Nature can be in one of P states, which are labelled by a variable fi which 
takes integer values /i G V = {1,2,..., P}. We assume that P is large and 
of the same order of and we define a = P/N, which we eventually keep 
finite in the limit N —>■ oo. The reason for this particular limit is because, as 
first shown in ref. [8], the model's behavior only depend on the combination 
a = P/N in the large N limit. The variable /i encodes all possible information 
on the state of the environment where agents live, so we shall sometimes call 
fi "information", fi is drawn from a distribution on V, independently at 
any time step. Most results will be presented for the uniform case = 1/-P. 
In both the El Farol model and in the minority game, /z has a different, more 
complex definition, on which we shall return later in section 9. In what follows, 
we shall denote with an over-line O = J2f_iev Q^O^ the average of a quantity 
over /i. For any /i G P, payoffs are still given by Eq. (1). Strictly speaking 
/i is a so called sun-spot[2,Q] because the payoffs only depend on the actions 
of agents. Now, however, the pure strategies of each agent may depend on 
/i. We call JJ^ the set of all such strategies: An element of is a function 
a : fi E V ^ E A OT a. P dimensional vector with coordinates a^, V/i G 
There are \A''^\ = 2^ possible such functions. We call aj G a possible pure 
strategy for agent i G J\f, with elements E A for all fi E V. With this 
notations, the payoff "matrix" reads Ui{ai,a_i) = —aA where = Z^jO-f- 
At this level we have just replicated the game of the previous level P times. 
Again there cannot be a best strategy a E A'^ for the same reasons as before. 

2.3 Heterogeneous beliefs and strategies 

Now we assume that each agent only restricts her choice on a small subset of 
S elements of AJ^ . We use the vector notation Oj = (ai_j, . . . , j) to denote 
the subset of strategies available to agent i, with elements as,i E A^ . The 
action of agent i, when the state is fi and she chooses her s*^ strategy shall 
then be j G A. We shall mainly work on the case where the strategies as,i 
are randomly and independently drawn (with replacement) from A^ . More 
precisely 

P«,, = +1) = PK, = = ^' Vz G AT, s G {1, . . . , 5}, /X G p. (2) 

Note that independence of Si across agents is reasonable because is a sunspot 
and no pre-play communication is possible (agents are assigned their a, before 
the game starts). 

^ We use the simple letter a without the index to denote the function. 
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The utility of agent i, given the value of fi, his choice Sj and the choice of 
other agents s_j = {sj, j ^ i}, now becomes 

uns,,s.,) = -al,A^ with A'^=E<,r (3) 

The goal of each agent is to maximize his expected payoff over all possible 
values of /i, which, for agent i, reads 

p 

2.4 Motivation 

This structure was introduced in refs. [6,7] in order to model inductive ra- 
tional behavior of agents [1]. In few words the argument is the following: If 
agents have not a completely detailed model of the game they are engaged in, 
they may think that the value has some effect on the game's outcome A, 
eventually because she believes that other agents will beheve the same. This 
is a self-reinforcing belief because if agents behave differently for different val- 
ues of fi, the aggregate outcome A will indeed depend on fi, thus confirming 
agents' beliefs. The game's structure, however, is such that there can be no 
"commonality of expectations" [1]. This means that, since there is no rational 
best strategy Obest ~ because otherwise everybody would use that and loose - 
agents' expectations (or behefs) are forced to differ. 

On the basis of her mental schemes or hypotheses on the situation she faces, an 
agent may consider a particular strategy more "likely" to predict the "correct" 
action than another one. More precisely, she may consider that only the S 
forecasting rules Oj = ■ ■ ■ , 05,1), out of all the 2^ such rules in , are 
"reasonable" or compatible with her beliefs and then restrict her choice to 
just those. Heterogeneous beliefs are represented by the fact that each agent 
draws her strategies at random, independently from others using Eq. (2). 
Alternatively, one may think, following Aumann [21], that the a, are "rules of 
thumb" that agent i evolved or learned in other contexts and which she applies 
in this context. We refer the reader to refs. [1,21] for a deeper discussion of 
this behavior. 

It is worth to remark that, in this view, the restriction from 2^ to S strategy 
is voluntary. It is not difficult to argue that agents have payoff incentives to 
increase the number of their strategies. In refs. [1,6,7], the state /i is known 
to agents before they take a decision. So why don't agents take a decision sf 
which depends on - or even an action af which depends on /il The answer is 
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that they do not do so because of computational costs one is implicitly building 
in the model: We are assuming that agents, in complex strategic situations, 
prefer to simplify decision tasks. 

While this motivation may seem reasonable for an issue such as going to a 
bar or not [1], it may not be appropriate for agents in financial markets as 
proposed in ref. [7]. 

On the other hand, this game's structure is justified if we assume that agents 
do not know the state before their choice. Indeed if af where known to 
agent i before taking her decision, it would be reasonable for her to decide 
her best action conditional on her private information af. If n is not known 
in advance, one can imagine a situation where agents resort to S "devices" 
which take actions j for them. The restriction in strategy space in this case 
is due to some implicit constraints or costs. 

Rather then defending the behavioral assumptions of refs. [1,6,7] or pushing 
further these arguments, we prefer to remain at a generic level. Indeed we be- 
lieve the model displays collective phenomena which are of a generic relevance, 
because of their statistical nature. Furthermore, this collective behavior is so 
rich that, in our opinion, it deserves investigation by its own. 

2.5 Notation: mixed strategies and averages 

Before coming to the analysis of the game, let us introduce mixed strategies. 
The mixed strategy Tr^^j, s = 1, . . . , 5" of each agent i is a distribution over her 
available strategics: tTc,. j is, in other words, the probability with which agent i 
plays her s^^ strategy. Again we use a vector notation vfj = (vri^j, . . . , tts^i) G Aj, 
where Aj is the S dimensional simplex of i^^ agent. We introduce the scalar 
product u • V = J2s=i''^s-i'^s for vectors u,v e m^.We also define the norm 
\v\'^ = V ■ V oi vectors in IR^ . Expectations on the mixed strategy of player i 
reads Ejf^ (v) = tti ■ v. We also define the direct product A-'^ = Yli^j^ Aj which 
we shall also call the phase space. A point (tti, . . . , ttat) g A-'^ is indeed a 
possible state of the system. Finally we use the shorthand notation 

s s 

(O) = ^^1,1 • ■■T^SN,N Os,,...,s^ (5) 

Sl=l SJV = 1 

for the expectation on the product measure of mixed strategies over the phase 
space A'^. For example we shall frequently refer to the quantity 

{An = E^^■ «r- (6) 
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3 Characterization of collective behavior 

As a preliminary to a more detailed discussion, we find it useful to introduce 
the key quantities which describe the collective behavior. First, as a measure 
of global efficiency, we take 

where we remind that (. . .) means expectation over the variables Si with the 
corresponding mixed strategy distribution Tr^.^j. Note that o"^ = is 
just the total loss of agents. A small value of implies an efficient coordination 
among agents. 

By construction, the model is symmetric in the sense that, for any /i, no partic- 
ular sign of is a priori preferred. We shall see, however, that this symmetry 
can be "broken" resulting in a state where A^^ may take more probably posi- 
tive than negative values for some ii and vice-versa for other values of ii. As 
a measure of this asymmetry, it is useful to introduce the quantity 

H^W={:Q'' {A^" = E (E (8) 

Note that H > implies that there is a best strategy a^gg^. = —sign (A^) that 
could ensures a positive payoff to a new-comer agent. Ideally because if the 
new-comer really starts playing the game, she will also affect the outcome 
A'^. This suggests that H can be regarded as a measure of the exploitable 
information content of the system by an external agent [11]. 

Both of these quantities are extensive, i.e. are proportional to N ior N ^ oo, 
and we shall mainly be interested in the finite quantities a'^/N and H/N . As 
a statistical characterization of the equilibrium, it is useful to introduce the 

self overlap 

G=^Ew'=^i:i:<< (9) 

which gives a measure of the average spread of mixed strategies played by 
agents. If all agents play pure strategies G = 1 whereas G = 1/5 if Tig^i = l/S 
Vs,i. Therefore 1/G is a measure of the "effective" number of strategies that 
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agents play on average. Note that can be written as 

a 1 H 

]^ = ]^ + l- G'-]^E E 7r,,i7r,/,ia,,ia,/,i ^ — + 1 - G (10) 

where in the last relation we neglected terms which vanish in the limit N ^ oo 
(because a^jAs^ ~ p-1/2 fQ^. g ^ g'^^ -^q^ (^iq^ means that the loss of agents 
come either the asymmetry H which they produce or from the stochastic 
fluctuations of their choices. Indeed if agents play pure strategies, G — 1 
and the last term vanishes. Put differently, the stochastic fluctuations cr^ of 
the market - or volatility - has a systematic contribution H arising from 
unexploited asymmetries and a stochastic one 1 — G, which is generated by 
stochastic choice of agents. 



4 Ncish Equilibria 

Given the payoffs and the choices available to agents we now briefly discuss 
Nash equilibria of the stage game. The motivation for this section is that, on 
one hand Nash equilibria shall provide a reference framework for the following 
discussion. On the other this discussion allows to appreciate the strategic 
complexity of the problem. For our purposes, Nash equilibria are those states 
which are stable under payoff incentives, given the choices available to agents. 
We shall not discuss refinements. We shall remain at a quite simple level 
without pretending either completeness or rigor. 

Given the symmetry of the game, let us first look for symmetric Nash equilibria 
at the level of actions only (i.e. P = 1) with no restriction on strategies (oj = A 
for all i G A/") . These cannot be in pure actions so let us look for Nash equilibria 
in mixed actions: Each player either plays = +1 with probability tTj or she 
plays = — 1 otherwise. It is easy to see that tt^ = 1/2 is the only symmetric 
Nash equilibrium: No agent has incentives to deviate from the choice tt^ — 1/2 
if others stick to it. This state, which we shall call the random agent state, is 
usually taken as a reference state[8,13] and it is characterized by cr^ = A^" and 

The game has many more Nash equilibria than the symmetric one. For example 
in pure actions, any state where \A\ — 1 and N odd (or \A\ — and N even) is 
a Nash equilibrium. Indeed, focusing on odd, agents in the minority (playing 
tti = —A) would decrease her payoff, switching to the majority side. On the 
other hand agents in the majority cannot increase their payoff changing from 
tti — A to Qi — —A because then, also the majority would change A — > —A. 
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The number ^^Isi^ of these Nash equihbria is 




^Nash - I iv-1 ) + I Af+1 ) ' ^ ' k\{n - k)\ ^^^^ 



which is exponentially large in A^. Each of these states is globally optimal, 
since it has no fluctuation: cr^ = i7 = 1 as compared to = A^, if = in the 
symmetric Nash equilibrium. 

There are many more Nash equilibria at this simple levelQ. These simple con- 
siderations are already enough to appreciate the complexity of the problem. 
Note in particular that virtually all possible collective behavior, as parametrized 
by 0"^ and H is possible. 

The complexity increases when P > 1 and agents have still no restriction 
on strategies (a, = A''^ for all i G Af) Again we have the symmetric Nash 
equilibrium - the random agents state - where, for any /i, each agent plays 
a = +1 with probability 1/2. The number of Nash equilibria in pure actions 
is huge. Indeed any combination of P pure actions Nash equilibria at the level 
of actions (one for each value of /i), is a Nash equilibrium. There are therefore 

^Nash = (^Nash) ^uch equilibria each of them with minimal fluctuations 
o"^ = 1. Again there are many more Nash equilibria. 

When the strategies of agents are restricted to the sets and S is small 
(typically flnite when A^, P — > oo), the problem of identifying Nash equilibria 
becomes more complex. One way to tackle the problem is to write down the 
multi-population standard replicator dynamics [22] and then identify Nash 
equilibria in evolutionarily stable strategies by its stationary and stable points. 
Again we make no claim of completeness: We just focus on a particular subclass 
of Nash equilibria^ - which are evolutionarily stable - which shall play a 
peculiar role in the following. 



The multi-population standard replicator dynamics [22] (RD) reads 



dt 



CLsA^j ■ 3j) - (vfj ■ ai){7Tj ■ dj) 



(12) 



We observe that = Z^i.j^i iji ■ ■ %) + A^ is a Lyapunov function under 



^ Consider e.g. to split the population N into two groups of K agents playing pure 
actions and N — K playing symmetric mixed actions. 
^ These are strict Nash equilibria. 
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this dynamics. Indeed a little algebra leads to 



dt 



{as^i — TTj ■ di) ^ Tlj ■ dj 



< 0. 



(13) 



Therefore Nash equilibria are local minima of o"^ in A-^. Furthermore, o"^ is a 



linear function of vr^^j for any i, s, so that 



0, Ws,i. Therefore a is an 



harmonic function in A-^ which implies that the minima are on the boundary 
of A-'^. This holds for any subset of variables tTj^s which therefore implies that 
minima are located in the corners of the simplex, i.e. Nash equilibria are in 
pure strategies (G = 1). This, in its turn, implies that Nash equilibria have 
a'^ ^Hhj Eq. (10). 

A detailed characterization of these Nash equilibria shall be given elsewhere 
[23]. Here we briefly mention that also in this case Nash equilibria are expo- 
nentially many in A^. This makes the analytic calculation a step more difficult 
than the one we shall present later. A simplified approximate calculation (see 
appendix) gives the following lower bouncQ: 



a 



> 



for a > z{Sf 
for a < z{Sy 



(14) 



where a = P/N and z{S) = j^^dze-'\[l - erfc(2)/2]^-i is the ex- 

pected value of the maximum among S standard random variable (for 5* ^ 1, 
z{S) ~ 721^). 

Figure 1 shows that the lower bound is already a good approximation to the 
typical value of cr^ in the Nash equilibrium, specially for small values of S. Eq. 
(14) implies that, for fixed S", increases with a, which is reasonable because 
the complexity of information increases and the resources of agents is limited 
by 5*. For fixed a, Eq. (14) suggests that cr^ decreases with S. So if agents 
are given more resources (larger S"), they attain a better equilibrium. Both of 
these features are confirmed by numerical simulations (see fig. 1). 

It is worth to point out that the game specified by the payoffs of Eq. (4), 
for A^ and P very large, implies a fantastic computational complexity. De- 
ductive rational agents should be able to master a chain of logical deductions 
of formidable complexity in order to derive their best response. The efforts 
required by this strategic situation may well exceed the bounds of memory 

^ Strictly speaking, the meaning of this lower bound is that the probability to 
observe smaller than the lower bound decreases exponentially with N 
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Fig. 1. Global efficiency a'^ /N as a function of a for = 2, 3, and 4 from numerical 
simulations with P = 128 averaged over 100 realizations of Oj (symbols) and from 
the theoretical lower bound (lines). 

and computational capabilities of any realistic agent or more simply the re- 
sources she is likely to devote to the problem. Furthermore her assumption 
that everybody else behaves as a rational deductive player becomes more and 
more unrealistic as grows large. Finally, even with deductive rational agents 
there would still be the problem of equilibrium selection which, in this case, 
involves a huge number of possible equilibria. 



5 Repeated game: Learning and inductive rationality 



Deductive rationality, as suggested in refs. [1,6,7], is unrealistic in such complex 
strategic situations^ and it has to be replaced by inductive rationality. This 
amounts to assume that agents try to learn what their best choice is from 
their past performance. We henceforth focus on the repeated game in which 
agents meet once and again to play the stage game of section 2.3. Different 
stage games are distinguished by the time label hj t E IN. For example, Si{t) 
denotes the strategy chosen by agent i at time t and fi{t) the information 
available at that time. 



We do not discuss learning and inductive rationality in simpler cases such as e.g. 
P = 1 and di = A for all i G M. For a discussion of reinforcement learning in El 
Farol problem at this level see R. Pranke (1999). 
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5.1 Exponential learning: 

It is generally accepted that agents follow more likely strategies which have 
been more successful in the past, which is known[2,24] as the "law of effect". 
There are several behavioral models implementing the law of effect (see e.g. 
[2,24]). Here we assume that agents follow an exponential learning behavior: 
Each agent i assigns scores Us,i(t) to each of their strategies s = 1, . . . , S and 
she plays strategy s with a probability which depends exponentially on its 
score: 

= j^s_ ^r.u.^t) (15) 



where Fj > is a numerical constant, which may differ for each agent i G A/". 
This model for discrete choice - called the Logit model - has a long tradition in 
economics^ [25] and some experimental support (see e.g. [26]). The MG has 
been originally introduced with Fj = oo [6-8] and only recently it has been 
generalized to Fj < cxd [27]. Note that tTs^i here is no more a mixed strategies 
- which is the object of agents' strategic choice - but rather it encodes a 
particular behavioral model. 

At time t = scores are set to some Us^i{0), which encodes prior beliefs: e.g. 
Us,i{0) > Us/^i{0) means that agent i considers strategy s a priori more likely 
to be successful than s'. 

At later times t > 0, agents update the scores of each of their strategies 
s = 1, . . . , S" in an additive way 

UsAt+l) = UsAt) + ^uff\t,s,{t),s.,{t)], (16) 



where the reinforcement AUg l [t, Si{t), s_i{t)] quantifies the "perceived" suc- 
cess of each of their strategies s = 1, . . . , S" at time t. This generally depends 
both on the state /x(t) and on the strategies Si(t) and s_i(t) played, at that 
time by agent i and by her opponents. 



° This model is quite appealing since it satisfies the axiom of independence from 
irrelevant alternatives which states that the relative odds of choices s and s' does 
not depend on whether another choice s" is possible or not. 
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5.2 Naive and sophisticated agents 



What is the perceived success Af/^ ^ of a strategy s? The most natural way to 
quantify the success of a strategy is by the payoff it dehvers to the agent if 
played. This suggests[^ that A[/^j = [s, s_j]/P or 

UsAt + 1) = UsAt) + uf\s, s.,{t)]/P. (17) 



By this equation, however, one assumes that agent i knows what payoff she 
would have got if she had played any strategy s, including those s ^ Si{t) 
which were not used. In other words, agents must have full information on the 
effects (payoffs) of all of their strategies. Furthermore agents take into account 
the way in which the aggregate quantity A^^*^ would have changed if they had 
played strategy s ^ Si(t). Agents following Eq. (17) are able of sophisticated 
counter-factual thinking, and are henceforth called sophisticated agentsp^. It is 
worth observing that the score, with the dynamics (17), acquires the meaning 
of cumulated payoff: Ug^'^) is indeed the payoff agent i would have received 
(divided by P) if she had always played strategy s against her opponents 
strategies S-i{t') for all t' < t. 

In the minority gameP^ agents are naive [4]: i) they eventually have partial 
information, which means that they only know the payoff delivered by the 
strategy Si{t) which they actually played, ii) they behave as if they were 
playing against an exogenous signal A'^'^^\ rather than — 1 other agents. 
Naive agents neglect their impact on the aggregate and update scores as 
if A'^ had not changed if they had used a different strategy. More precisely 

AU:f[t,s.{t),s..it)] = -a!^A^^{t)/P. (18) 



Note that Eq. (17) does not depend on the strategy Siit) which agent i used, 
whereas Eq. (18) depends on it because y4^'^*)(t) contains the action a^^*]) j 
which agent i actually played. Regarding the MG as a toy market, if the 
aggregate A^^^^\t) plays the role of price, Eq. (18) implies that agents behave 
as "price takers" : They behave as if price did not depend on their actions. By so 
doing, they simplify considerably the strategic complexity of the context they 
face. Eq. (18) may be a closer approximation than Eq. (17) to the behavior of 
real agents in complex strategic situations. 



^ The factor 1/P is introduced here and in the following equations for convenience. 
The reason will become clear in the next section. 

This term, as opposed to naive, is borrowed from ref. [4]. 

and probably also in the El Farol problem, ref. [1] is not very clear on this point. 
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One would naively expect that when N ^ oo the difference between Eqs. 
(17) and (18) is negligible. Indeed the relative impact of an agent on A^^ is 
negligible in that limit. Surprisingly we shall see that this is not so and a 
system of naive agents behave quite differently from sophisticated agents with 
full information. In order to study the effect of the impact of agent's choice 
on the aggregate, we generahze Eq. (18) including a term +r}Ss,si{t)/ P- The 
dynamics of scores then reads 

Us,i{t + 1) = Us,{t) - a^,fA^^'\t)/P + v5s,s.it)/P. (19) 

The last term, which is absent (?7 = 0) in the original definition of the MG, 
models the tendency of agents to stick to the strategy they are currently using. 
Indeed 77 > implies that agents reward the strategy they use s — Si{t) with 
respect to those they are not currently using s 7^ Siit). By doing this, agents 
approximately account for the impact of their actions on the global variable 
A^^*^^{t). As we shall see, this term has very deep consequences. 

We shall consider these two cases - of partial information with naive agents 
and of full information - separately below, and see that the collective behavior 
is remarkably different. Before doing that, we shall first discuss the dynamics 
of scores in the long run. 



6 Continuum time limit and the dynamics in the long run 

In this section, we shall first derive a continuum time dynamics for Eq. (16) 
which captures the long run behavior of the system. Then we shall show that 
the collective behavior of agents, within this continuum time dynamics, admits 
a Lyapunov function, i.e. a functions which is minimized along the trajecto- 
ries of the dynamics of the system. The dynamics therefore converges to the 
minima of this function. This is a quite important step, since it allows to 
turn the study of the stationary state of the dynamical model into the study 
of the local minima of the Lyapunov function. Therefore one can regard the 
Lyapunov function as the Hamiltonian of a system and resort to the powerful 
tools of statistical mechanics in order to study the statistical properties of its 
ground state (global minimum) and eventually of its meta-stable states (local 
minima). This shall be the subject of the next section. 

In order to study the stationary state properties of the system, we need to 
consider the long time limit of the dynamics of scores, Eq. (16). The key 
observation, in this respect, is that we expect that Us,i{t) changes significantly 
and systematically only over time-scales of order At ~ P. Indeed the score 
of strategies depend on their performance on all the P states ji. In order to 
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capture the long time dynamics of scores, let us set 



Us,m = UsA'^)^ with r = ^. (20) 



The dynamics in continuum time of Us,i is obtained iterating Eq. (16) for 
At = Pdr time steps: 



dr P dr , r, 

t=FT 



y: A[/i:r^[t,.,(t),._,(t)] (21) 



We can now take the thermodynamic limit N, P ^ oo keeping dr finite. By 
the law of large numbers, the right hand side converges almost surely to its 
average value (see later). Here AU^f^ is a function of the random variables 
/i(t) and {sj(t), j G A^} which are chosen independently at each time step. 
If the stochastic fluctuations in f/^ j are small (see later) also the distribution 
'^s,i{t) will be well behaved in the limit P oo, specially if Fj is smalip^. 
Indeed defining vrs,i('r) = 7rs,i(t = Pr), Eq. (15) becomes 

ns,i{T) = — -J—- (22) 



which remains meaningful in the limit P — > oo. Equivalently we may say 
that the distribution vTj remains approximately constant over Pdr time steps. 
Taking the continuum time limit — >■ in Eq. (21), we find 

^ = (AtU (23) 
ar 



where averages are taken with respect to the distributions nj of strategies and 
p'^ of /i. 

It is worth to point out that the order of the two limits - first P ^ oo 
and then dr ^ - is quite important. Indeed the infinitesimal time interval 
dr corresponds to a very large number Pdr of time steps, which eventually 
diverges. This implies that the characteristic time of the system is proportional 
to P time step^3Z]- 



Interestingly, numerical simulations show that the continuum time approximation 
works generally also in the limit Fj — > oo. 

In other words, P repetition of the game is the analogous of a "sweep" of the 
system in a Montecarlo simulation. 
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The validity of the law of large number can be verified studying the fluc- 
tuations of AUgf^ around its average {AUg^i). By routine use of Tchebicev 
inequality, it is enough to show that 

r, P(r+dr)-l 



vanishes when A^, P — >■ oo. This is indeed the case because the AU^f^ depends 
on the random variables and {sj{t), j G J\f}, which are drawn indepen- 
dently at each t from their distributions and {ttj, j e J\f} respectively. 
Only the terms with t ~t' or ^{t) — IJi.{t') contribute to the average, whereas 
all other terms vanish because the average factorizcs. Only a fraction ~ 1/F 
of the terms is non- vanishing, which implies that the expression (24) is indeed 
of order 1/ P and it vanishes as claimed. 

Wc shall henceforth work with continuum time and drop the tilde over U and 
TT, in order to simplify notations. Combining Eqs. (15) and (23), and with a 
little algebra, one finds that Tis,i satisfies the equation 



dT 



dUs, 
dr 



dr 



r,;7r.. 



{AUs 



TTi • [AUi 



(25) 



6. 1 Lyapunov function for naive agents 



The dynamics (19) of naive agents in continuum time is easily derived com- 
bining Eqs. (19,25) and a little algebra. It reads 



dTT. 



dr 




(26) 



with the shorthand (aj) = ttj ■ dj. This is different from RD (Eq. 12). So Tfj 
does not converge, in general, to a Nash equilibrium. Rather one can show 
that 



(27) 



is a Lyapunov function of this dynamics. Indeed, observing that 



OtTs i dr 
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and using Eq. (25), one finds that 



s 




2 



dH, 



E 



dH„ dTTi 



<0. (28) 



dr 



d-Ki dr 



TT, 




The dynamics converges therefore to the minima of Hj^. 

This equation imphes that in the stationary state {dHr^/dr — 0) each of the 
strategies played by agent i - those with tTs^^ > - has the same perceived 
success in the long run. Note also that for 77 = 1, by Eq. (10), Hi a"^ — N 
which implies that for 77 = 1 the stationary state is close to a Nash equilibrium. 

We shall come back to the statistical characterization of the stationary state 
for naive agents in the next section. It is worth to stress, at this point that 
this result holds for any realization of a^j. It actually holds for much more 
general models [28] . 

6.2 Agents with full information 

It is known [24] that exponential learning with full information, for a single 
agent playing against a stationary stochastic process, converges to rational 
expectations. If we can regard the opponents of i as a stationary process, this 
imphes that i^^ strategy converges to the best response. If this happens for 
all players the system converges to a Nash equilibrium. This is indeed what 
numerical experiments show (see figure 1). 

We can recover this result within the continuum time limit. Indeed, after some 
algebra one finds that Eq. (25) becomes, in this case 



Apart from the factor Fj, this coincides with the RD of Eq. (12). Again cr^ 
is minimized along the trajectories of Eq. (29): it is easy to check that the 
time derivative of cr^ is given by Eq. (13) with an extra factor Fj inside the 
sum on i G A/". Wc therefore conclude that with exponential learning and full 
information agents coordinate on a Nash equilibrium. Each agent plays, in the 
long run a pure strategy, i.e. G = 1. The Nash equilibrium to which agents 
converge depends on the initial conditions Us,i{0), i.e. on prior beliefs: Different 
initial conditions select different Nash equilibria. 



dr 



(29) 
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7 Statistical mechanics of naive agents 

As we have shown, the stationary state of the system is described by the 
ground state of the Hamiltonian iJ^. This can be analyzed using the tools of 
statistical physics and in particular, the replica method which allows us to 
deal with quenched disorder (i.e. agents' heterogeneity). The details of the 
calculation are described in the appendix in some detail. Here we shall just 
describe the results. We shall consider separately the results for 77 = 0, i.e. 
for the original MG, for which we can derive exact results within a relatively 
simple calculation. The case 1] > requires more complex calculations which 
shall be the subject of a forthcoming paper [23]. Here, the qualitative under- 
standing for 7] > provided by the present approach will be supplemented by 
numerical simulations. 



7.1 rj = 0: The minority game 

It is easy to see that for rj = 0, the Hamiltonian Hq = H is a non-negative 
quadratic form of the variables tt^ j and therefore it attains its minimum on 
a connected subset M. G A-^f^. We therefore conclude that the long run 
dynamics of this system is described by the minimum of H. Loosely speaking, 
in view of our definition of H, we may say that naive agents, in the minority 
game, minimize the "information content" of the market output A^^*\t). 

A complete statistical characterization of the minima of H in the limit N 00 
with P/N = a finite and g'^ = 1/-P, can be obtained from the replica method 
a tool of statistical mechanics devised to deal with disordered systems. An 
account of this method is given in the appendix together with technical details 
on the calculation. Here we only discuss the results and their interpretation. 
We distinguish two regimes separated by a phase transition which occurs as 
a adS) ^ 5/2 - 0.6626.... 



7.1.1 Asymmetric phase: a > ac 

For a > etc we find an asymmetric phase. Indeed H > which means that 
(A^) 7^ at least for some /i G P. The symmetry between the two actions in A 
is broken and a best strategy a^^st — ~sigii (^'^) arises in A''^. An iV-|- 1*^* agent 
who joined the game with this strategy would receive a payoff \A\ — a^est = 



Note that, on the contrary, o"^ is not positive definite and it attains its minima, 
the Nash equilibria, on a non-connected subset of A-^. 
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\A\ - IQ 

The set A4 where H attains its minimum consists of a single point, so that, 
for any initial conditions, the dynamics converges to the same final state in 
the long run. In other words prior beliefs of agents about their strategies are 
irrelevant in the long run. 

The asymmetry H decreases with decreasing a, which means increasing the 
number of agents at fixed P. Naively speaking the asymmetry in (A^) 
is exploited by the adaptive behavior of agents who then reduce it. Indeed 
agents are more and more selective in their choice, as shown by the fact that 
G increases as a decreases and the effective number of strategies used 1/G 
decreases. At the same time, as a decreases, the equilibrium becomes more 
and more "fragile" in the sense that its susceptibility to a generic perturbation 
increases (see the Appendix and ref. [16]). 

7.1.2 Phase transition and symmetric phase: a < ac 

As a — > the asymmetry vanishes H and the response of the system to a 
generic perturbation diverges. This signals a phase transition to the symmetric 
phase a < ac where H = and any perturbation can change dramatically the 
equilibrium. The set Ai where H attains its minimum is no more a single point, 
but rather an hyper-plane of a dimension which increases as a decreases. Any 
point in A4 is an equilibrium of Eq. (26) and any displacement along this 
set can occur freely. In particular, with different initial conditions, the system 
reaches different points of Ai. The dynamics (26) indeed converges to the 
"closest" point on Ai which is on its trajectory. In other words, prior beliefs 
Us,i{0) are relevant for a < ac- With different f/s,i(0) the system reaches 
different equilibria. 

7.1.3 Anti-persistence in the symmetric phase 

For a < Uc the system is dynamically degenerate: any displacement on the set 
M. can occur freely. In particular stochastic fluctuations can induce a motion 
in M.. This is what happens for Fj ^ 1 where numerical simulations show 
the presence of "crowd effects" [8,13,14]. This effect manifests in an increases 
of a'^ /N as a decreases, which is much faster than what predicted by our 
theory (see full symbols in fig. 2). This behavior can be traced back to a 
dynamical anti-persistence [11] resulting from the fact that agents, neglecting 
their impact on A{t), over-estimate the performance of the strategies they do 

The term — a^est "market" impact caused by the new agents. It arises 

because if the strategy aj^gg^ where actuaUy played by the N + 1*^ agent, that would 
also modify A^^ A>^ + aj^gg^. Ref. [17] discusses in greater detail these issues. 
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Fig. 2. Global efficiency a'^/N as a function of a for 5 = 2, 3, 4 and 5 from numerical 
simulations of the minority game with = 101 agents and Fj = oo (averages 
were taken after lOOP time steps), averaged over 100 realizations of di (small full 
symbols), from numerical minimization of H (large open symbols) and from the 
theoretical calculation (lines) with Us^i{0) = 0. 

not play and they keep switching from one to the other. Each time a particular 
state /i shows up, agents tend to do the opposite action of what they did the 
last time they saw the same state fi. Therefore the period of this dynamics is 
of 2P time stepsf^ [11]. 

The analytic approach to this effects requires the study of the dynamical 
solutions of Eq. (19), which go beyond the aims of the present work. We suspect 
that one should refine our continuum time approach, including eventually the 
second order time derivative and the effects of fluctuations to some extent. 
Indeed a periodic motion is usually related to the inertial term {d^U/dr'^) of 
the dynamic equation. 

The periodic behavior persists as long as l/Fj is much smaller than the am- 
plitude of the oscillations of Ug^iit). As Fj decreases, Eq. (15) finally smoothes 
the oscillations in agents choices and the anti-persistent behavior disappears, 
as indeed observed in [27]. 



" The fact that agents do not realize that they have an impact on the aggregate 
in this case is probably unrealistic. From their point of view, the same strategy 
which had a good score of performance when they were not using it, starts perform- 
ing badly as soon as they use it. This could either be considered a manifestation 
of Murphy's law or the fact that agents rationality is bounded below the level of 
common sense. 
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7.1-4 Global efficiency and Tragedy of Commons 

As far as global efficiency is concerned, we find that a'^/N increases with a 
towards the random agent limit as 1 — a^/N ~ 1/a. This is shown in figure 
2, which also shows that numerical simulations for finite populations fully 
confirm our theoretical results. For fixed a, as S increases a'^/N first decreases 
moderately as long as a > ac{S). Then the system enters the symmetric 
phase (a < etc) because ac{S) increases, and a^/N increases with S towards 
the random agent hmit as 1 — a'^/N 1/S. We then conclude that allowing 
agents to have more strategies does not increase global efficiency as in the 
Nash equilibrium. Rather it pushes the system in the symmetric phase where 
cr^ converges to the random agent limit as 5' — > oo. 

As long as the system is in the asymmetric phase, agents have incentives to 
consider more than S strategies, because that allows them to detect better the 
asymmetry. However if every agent enlarges the set of strategies she considers, 
i.e. if S increases, the system enters in the symmetric phase. Then global 
efficiency starts to dechne. This behavior is reminiscent of the Tragedy of the 
Commons [30] : a situation where individual utility maximization by all agents 
leads to over-exploitation of common resources and poor payoffs. 

7.2 Rewarding the played strategy: rj > 

We expect that with rj = 1 agents behave almost optimally, in the sense that 
they converge to a stationary state which is close to a Nash equilibrium. Given 
the difference in the collective behavior of agents in the two cases - which may 
be appreciated comparing figure 2 for rj = with figure 1 for = 1 - it is 
natural to ask what happens when rj changes continuously from to 1. 

Figure 3 shows the analytical prediction for the dependence on rj of a'^/N 
ioY S = 2. These are based on the replica symmetric ansatz which is only 
valid for a > OrsbI'?); where aRSB(^) marks a replica symmetry breaking 
phase transition, which will be discussed elsewhere in detail [23]. Here we just 
mention that a^sBiv) = for < 0, Q!rsb(0) = c^c and ci;rsb('7) > 1 — l/\/7ia 
(for S — 2 and) 77 > 0. For a < q;rsb(^) the analytical results derived in the 
appendix provides an approximate description of the behavior of the system 
which is however sufficient to appreciate the relevant features. 

The most striking consequence of the result in fig. 3 is that the behavior 
of a'^/N is quite different for a > etc and for a < a,.- Indeed for large a, 

a'^ /N changes continuously with 77 whereas a'^ /N drops discontinuously to 
zero as — ^ for small a. This feature is reproduced in figure 4 for two 
characteristic values of a also shown as arrows in fig. 3. We show both the 
behavior derived from the numerical minimization of if^ and the behavior 
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Fig. 3. Theoretical estimate of global efficiency cr^/iV as a function of a for S = 2 
and several values of rj within the replica symmetric ansatz. 
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Fig. 4. a'^/N as a function of r/ for S = 2 and a ~ 0.079 < Oc ~ 0.3374 and 
a ~ 0.63 > ac- Results both of numerical simulations of the minority game and of 
the numerical minimization of are shown. 

of the modified minority game with rewarding. Numerical results agree quite 
well in an intermediate range of values of r] whereas for 77 > 0.5 or < —0.2 

some discrepancy - which we believe is due to finite size effects - is found. 
This effect can be even more spectacular when anti-persistence effects occur. 
Indeed the jump of a"^ /N at ?] = can be of several orders of magnitude! 

The origin of this behavior lies in the dynamic degeneracy of the system for 
a < «c and ?7 = 0. Even an infinitesimal change in t] can dramatically alter 
the nature of the minima of H^f. for negative 7] there is only one minimum 
which becomes shallower and shallower as 77 — > 0~. At 77 = the minimum is 
always unique but it is no more point- like. Rather it is a connected set M.. An 
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infinitesimal positive value of r] is enough to lift this degeneracy and select only 
some extreme points of A4 as the minima of H^. The set of minima becomes 
suddenly disconnected. At fixed a < ac, varying r] across the transition Hjj 
changes continuously - with a discontinuity in its first derivative - whereas G 
and hence a'^/N change discontinuously with a jump. 

The potential implications of this result are quite striking: rewarding the strat- 
egy played more than those which have not been played by a small amount is 
always advantageous, both individually (see below) and globally. In particular, 
an infinitesimal reward is sufficient to avoid crowd effects when a is small and 
to reduce the fluctuations by a finite amount. 



8 From the agent's viewpoint 

Let us consider the behavior of agents in some detail (see also ref. [17] for a 
related discussion) . Our goal is to show that individual payoffs increase when 
T) increases in [0,1]. This means that rewarding the strategy "in vivo", that 
which has actually been played, is convenient for each agent. We focus on one 
agent, say i, and assume that others play strategies S-j(t) according to some 
stationary probability distribution 7f_j. If /i(t) is drawn randomly from we 

can consider A'^p(t) = J2j^iCi-s it) ji^) ^ ^ stationary process. Since we deal 
with one agent, we shall drop the subscript i in this section. Also we first focus 
on the case 77 < 1 and only after discuss the case 77 > 1. In the long run the 
perceived performance of strategy s is 



(AUs) = -Us {A_i) - n-aas + rjiTg 

^-a, (A_,)-(l-77)7r, (30) 

where the approximation in Eq. (30) holds for P ^ 1 since o^ToJ ~ 1/ a/P for 
s' ^ s. Because of Eq. (25), strategies can either i) have tt^ > and {AUg) = v 
independent of s or ii) have tt^ = and {AUg) < v. This can be understood 
by a rather simple argument: Imagine that strategy 1 has {AUi) > v. Then 
by the very learning dynamics, the agent shall use strategy 1 more frequently 
than others and hence tti shall increase. Because of the last term in Eq. (30) 
that will decrease the perceived performance {AUi) of that strategy. On the 
other hand, if {AUi) < v the agent shall use it less frequently, hence its {AUi) 
shall increase. If (AUi) < v even when tti — > then the agents will never play 
that strategy, i.e. tti = 0. 

Let n < S he the number of strategies with tt^ > and let these be labeled 
by s = 1, . . . , n, whereas tt^ = for A; > n. Taking the sum of Eq. (30) on 



25 



s — 1, . . . ,n we find 



1 " 



1 — 1] 



n 



s=l 



n 



where n is fixed by tlie condition v > —at {A-i) for all k > n. Clearly 
—as (A^i) > V > —ak (A^i) for any s < n and k > n hence the n strategies 
which the agent uses are the n more efficient ones. Then Eq. (30) becomes 

7is = - + T^— ( - cts' {A-i) - as {A_i)] . 

n 1 - 77 pr; J 



Note that strategies with a larger —as (^-i) are played more frequently. The 
average payoff u — —'if-a{A-i) — l delivered by a learning behavior with 
parameter 77 is 

-1 n 1 n / -1 n \ 2 

u^—J2as{A_i) + -—J2[^s{A^i)--J2^s'{A-i)] -1 (31) 



which is an increasing function of rj for rj < 1. Indeed at fixed n, this is trivially 
true. With some more algebra, it is easy to check that n is a non-increasing 
function of 7] and that u increases as n decreases. This means that for rj < 1 
average payoffs are non- decreasing functions of rj as claimed. 

When ?7 — > 1 the only possible solution is that with n = 1 which means that 
the agent plays her best response to A_i. For 77 > 1 the agent over- weights 
the performance of her strategies. As a result she sticks to only one of her 
strategies, i.e. n — 1, but that need not be her best one. Without entering in 
too many details, let us only mention that for 77 > 1 the agent plays always 
one strategy which is dynamically selected by initial conditions and stochastic 
fluctuations. 



9 Exogenous vs endogenous information 



In the El Farol problem and in the MG the state fj,{t) is determined by the 

outcome of past games. In other words ^{t) is an endogenous information 
which encodes information on the game itself: Agents record which has been 
the winning action in the last M = logg P games and store this information 
in the binary representation of the integer /i. This means that /i is updated at 
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each time as: 

n{t + l)-mod(2^^it) + '-±^f^,p], ^W = E<t. (32) 

Note that A{t) depends on time both through and through the choice 
Si{t) of each agent i e jV at time t. Eq. (32) imphes that the dynamics of iJ,{t) 
is defined by the collective behavior of the game itself. Still payoffs do not 
depend on fi{t) which is a sun-spot. However agents can coordinate in such 
a way that some state - i.e. some pattern in the time series of A{t) - can 
occur more frequently than some other and eventually some can never occur. 
As far as the collective behavior in the stationary state is concerned, the only 
relevant information of this dynamics is the stationary state distribution 
of the process Eq. (32). In the long run, the distribution is determined 
by the collective behavior of agents through A{t). Technically, the problem of 
computing g^^ is related to the diffusion of a particle on the directed graph 
defined by Eq. (32), where each note /j, e V has two outgoing hnks to nodes 
mod(2//, P) + 1 and mod(2// + 1, P) + 1 (and two incoming links). Stochastic 
processes on this graph, known as De Bruijn graph, are called shift register 
sequences [31] in computer science. 

This differs from the setting we have discussed so far, in which ii{t) is in- 
dependently drawn at each time t with g^^ = 1/P. We may call fi exogenous 
information in this case since it can be considered to encode information about 
an external system, eventually the environment where agents live. This ver- 
sion of the MG has been first introduced by Cavagna [10]. He found that 
in numerical simulations the collective behavior with exogenous information 
differs only weakly from that under endogenous information. Having already 
discussed the results for the exogenous case, let us now consider how these 
change under endogenous information. 

9.1 Naive agents with rj < and endogenous information 

With endogenous information the system behaves qualitatively in the same 
way, as first observed in ref. [10] by numerical simulations for ri = 0. This 
is because the stationary state distribution g^^ of the process fi{t) - which is 
induced by the dynamics of agents through Eq. (32) ~ is almost uniform on 
V. Actually = 1/P for a < because of the symmetry of A^. 

In order to measure the deviation of g^ from the uniform distribution g'^^^f = 
1/P, we compute the entropy S(P) = —J^^^-pg^logpg^. With the choice 
of base P for the logarithm S(P) = 1 for g^^ = 1/P so that 1 — S(P) is 
a reasonable measure of the deviation of g^^ from a uniform distribution. In 
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Fig. 5. Deviation of the distribution from the uniform one from numerical simu- 
lations with r] = 0. 

figure 5, 1 — E(P) is plotted for several values of P as a function of a. While 
for a < ftc we find S(P) = 1 to a great accuracy, for a > ac numerical results 
suggest that S(P) 1 as P = aN oo. On this basis, we conclude as in 
ref. [10], that the MG with endogenous information gives the same results as 
the MG with exogenous information. By a detailed study of the dynamics of 
the process ii{t) one can actually give a deeper theoretical foundation to this 
conclusion and derive analytically this result [32]. We conjecture that, as long 
as agents choice remains stochastic (G < 1), as for 1] < 0, the dynamics in fi(t) 
is ergodic in V. Note indeed that, for any /i, A{t) has stochastic fluctuations 
around its average value {A^^) which are of the same order of magnitude of the 
average itself. 



9.2 r] > and agents with full endogenous information 

A qualitatively different situation arises when rj > and in particular when 
agents have full information. We shall mainly discuss the latter case which 
corresponds to 77 = 1 and then discuss briefly the generic rj > case. The 
key observation is that agents for 77 = 1 play pure strategies (G = 1). This 
means that agents behave the same whenever = fi and accordingly A{t) 
shall always take the same value A^ each time ii{t) = /i. This implies that the 
dynamics of Eq. (32) for n{t) becomes deterministic. More precisely it locks 
into a periodic orbit //(i + T) — with some period T. Only the values 
of ^ into this orbit shall occur in the long run, whereas all other values of fi 
shall never occur. This means that agents strongly influence the time series 
of ii{t) and hence of the aggregate A{t). Most remarkably, in doing so, they 
achieve a much better coordination with respect to the exogenous information 
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case because they reduce the parameter a = P/N to a = T/N\^. Numerical 
studies show that T oc ^/P so that, in the hmit N oo with P/N = a finite, 
a oc and also a'^/N takes a very small valueP^ 

The same behavior shall be expected for all 77 > such that G = 1. Therefore 
we expect that for each a there shall be a particular value ?7eb(q;) beyond 
which ergodicity of the dynamics of fi{t) in V breaks down. For values of 
V > VEB{ci) we expect the dynamics of /i to lock into periodic orbits causing 
a reduction of cr^ similar to that discussed above. 



10 Discussion 

There is a growing literature on learning which addresses the issue of which 
learning procedure (modeling inductive rationality) may eventually lead to 
deductive rational outcomes [3, 24]. The choice made in the El Parol bar prob- 
lem and in the MG - which is exponential learning (Eq. 15) eventually with 
Fj — > cxD - is one of these as we have shown. What leads to equilibria different 
from Nash equilibria is the fact that agents i) have not full information on the 
effects of their strategies and ii) that they neglect or do not account properly 
for their impact on the aggregate in the evaluation of their strategies. 

In particular, for rj = 0, this new equilibrium, which we call naive agents equi- 
librium (NAE) differs substantially from a Nash equilibrium (NE), because: 

(1) In a NE global efficiency always increases as the number N of agents 
increases (with P fixed), whereas in the NAE it only decreases as far 
as iV < P/oc and then it increases in a way which depends on initial 
conditions (prior beliefs) and on the parameters F,. 

(2) There are (exponentially) many disconnected Nash equilibria which are 
selected by initial conditions, i.e. by prior beliefs. For a > Oc there is a 
unique NAE and, for all initial conditions, the system converges to it. For 
a < ttc there is a continuum of NAE, but they are all connected. 

(3) Global efficiency (cx^), for fixed a, always decreases as agents resources (S) 
increase, and it eventually converges to perfect optimization for S* ^ 00. 
In the NAE efficiency only mildly improves with S in the asymmetric 
phase. But increasing S also increases adS) and when adS) > a the 
system enters into the symmetric phase where increases with S towards 



Note that the values of fj, which occur in the long run are sampled uniformly. 

The result T oc \/P is what one would obtain on a random directed graph with P 
vertex each with two outgoing links. This is a reasonable approximation because the 
dynamics locks into periodic loops of T oc y/P vertices where the peculiar structure 
of De Bruijn graphs (see Eq. (32)) does not play a significant role. 
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the random agent limit (this occurs for Fj small [27]). 

(4) In the NE agents play pure strategies, i.e. tTj is a singleton \fi G Af. 
Indeed, for fixed opponent strategies s_j, each agent typicallyp^ has a 
pure strategy Sj - the best response - which is superior to others. In 
the NAE, agents mix this strategy with others because, neglecting their 
impact on A{t), they over-estimate the performance of the strategies they 
do not play. Playing a pure strategy reduces its perceived performance 
and this is why agents mix strategies in the NAE (see also ref. [17]). The 
probability tt^ j in the NAE is such that the perceived performance of all 
strategies which are played (tTs ^ > 0) is the same. 

(5) A consequence of the previous point is that, the origin of information 
is quite important for inductive agents with full information, while it is 
irrelevant in the NAE [10]. Inductive agents with full information lead, 
under endogenous information, to a deterministic dynamics of /i(t) and 
only a small subset of informations /i is ever visited. This in turn leads to 
a much more efficient coordination. Naive agents with endogenous infor- 
mation, on the other hand, induce a dynamics on fi{t) which is "ergodic", 
i.e. which visits each information /i with nearly the same frequency g^. 
Therefore the collective behavior is the same as that of the NAE with 
exogenous information. 

For intermediate values of 1] the collective behavior of agents interpolates be- 
tween these two situations in a continuous way for a > ac or in a discontinuous 
way for a < a^- 

We expect that rewarding the strategy which is played with respect to those 
which are not played should be advantageous both individually and globally in 
more general situations where agents interact through a global variable via a 
minority-like mechanism. Indeed one can show that the qualitative picture we 
have described remains the same if we allow for heterogeneity of various sorts 
such as allowing for a dependence on z of S" and ?7, or changing the distribution 
in Eq. (2) to a generic Pj(a) for a G iR [28] (see the appendix). 

Our results clearly allow for several extensions, as those of ref. [17]. It also 
suggests a theoretical approach to the El Farol problem [1]P^. We expect 



Typically here means that we disregard unlikely realizations where two pure 
strategies happen to yield the same payoffs. 

^'^ The key problem lies in the parametrization of forecasting rules: agents in the 
El Farol problem consider the record of past attendance to the bar - i.e. ^(t') for 
t' < t - whereas in the minority game agents only consider the record of the sign 
of A{t'). Focusing on the last M games, there can be N^^ possible records in the 
El Farol problem, instead of 2*'^. This causes no problem in principle since one 
can take P = N^'^. In practice however forecasting rules have to be "reasonable": 
For example a randomly drawn rule can easily predict a different outcome if the 
attendance of one of the past weeks just changes by one unit. Some sort of continuity 
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that the distinction between inductive agents with full information and naive 
agents to be of key importance also in the El Parol problem and we believe 
that, eventually, an analytical solution in the limit iV — > oo is possible along 
the same hnes followed here. 



A Appendix: Replica calculation for the MG 



Our goal is to compute and characterize the minimum of if^ = (A)'^ — rjNG, 
with G given by Eq. (9), in A'^ = {tt,, i e A/"}. Considering i?^ as an Hamil- 
tonian of a statistical mechanic's system, this can be done analyzing the zero 
temperature limit. First we build the partition function 

Z{0) = Tr^e-^^''{'^>, (A.l) 

where /3 is the inverse temperature and Tr^ stands for an integral on A'^ (we 
call simply tt an element of A'^). The quantity of interest is then 

min HM = - lim (3-HnZ{/3). (A.2) 

This in principle depends on the specific realization j of rules chosen by 
agents. In practice however, to leading order in N, all realizations of a^^ yield 
the same limit, which then coincides with the average of min^g^-^ iJ^{7r} over 
a^j. The average of InZ over the a's, which we denote by (. . .)„, is reduced to 
that of moments of Z using the replica trick[16]: 

With integer n the calculation of {Z"')^ amounts to study n replicas of the the 
same system with the same realization of a^j. To do this we introduce a set 
of dynamical variables TTa = {'n's,i,a} for each replica, which are labeled by the 
additional index a = 1, . . . , n. Each replica has its corresponding Hamiltonian, 
which we write as H!^{7ra} = A"^ — rjNGa^a where A'^ = J2ieAf^i,a ■ (^i and 
NGa,a = Hi Ki.aP (the reason for this notation shall become clear later). The 
set of all dynamical variables for all replicas is the direct product A'^" of 
n phase spaces A-^. In order to compute the limit n — > in Eq. (A. 3) one 
appeals to analytic continuation of {Z'^)^ for real n. We give here the details of 

in strategies should be introduced so that "similar" histories (=information fi) lead 
to similar forecasts. Even though it is not clear how to translate this requirement 
mathematically (one way could be that followed in ref. [33]) it is clear that it suggests 
that the number of relevant informations P is much less than . 
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the calculation in our specific case. More details on the nature of the method 
can be found in ref. [16]. We write 



a=l ij.eV 

ixev 0=1 



(A.4) 



where E^l- . .] stands for the expectation over the Gaussian variable (unit vari- 
ance an zero mean) z and we have introduced one such variable zj^ for each a 
and /i, using the identity Ez[e~^^^] = In addition we used the shorthand 

Tr^r for the integral over A-^". The average over j now factorizes 



and we can explicitly compute it using the distribution (2). This gives 



a=l 



n 

ieJV 



a,b=l ieJ\f 



In the last passage we used the relation cos a; ~ e~^^/^ which is correct to order 
in a power expansion. This is justified as long as — >^ as P = aN — > 00 
for each pL e V. Note that, because of this reason, we would have got the 
same result for any generic distribution Pi{a) of , such that (a) = and 
(a^) = 1. This allows us to understand why models with continuum strategies 
a^j G M, such as the one proposed in ref. [27], yield the same results as the 
one with binary strategies, which we are discussing here. Before going back 
to Eq. (A.4), we introduce the matrices G = {Ga,b, CL,b = l,...,n} and 
f = {vafi, a,b — 1, . . . ,n} through the identities 



oc 



j dra,bdGa,he 



for all a > b, where 6{x) is Dirac's delta function and we used its integral repre- 
sentation. The only part depending on the 7r"j in (Z") is e"^ Sa>6 Si '^i 
This can be factorized in the agent's index i and so the integral TVtt on A'^" 
can be factorized into N integrals over A" (=the direct product of the sim- 
plexes of the n replicas of the same agent's mixed strategies). With this we 
can write 

(Z") = I dr„,,dG'„,,e-^-^^^(«'^) (A.5) 
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where, speciahzing to the case g'^ = 



a 
2^ 



In det 



a 



a,b 



np 



J2 ^'^^bT^ 



(A.6) 



where / is the identity matrix. The first term arises from the expectation over 
z^. This factorizes for each /i and one is left with a Gaussian integral over 
z G IR^ . The second and the third terms arise from the integral representation 
of the delta functionr'^ 



The key point is that, in the limit N ^ oo the integral over the matrices f and 
G in Eq. (A. 5) are dominated by their saddle point value, i.e. by the values 
of b and Ga,b for which F attains its minimum valueP^. One should then 
study the first order conditions dF/dva^ = and dF/dGa,b = for all a,b. 
Here we focus on the replica symmetric approximation where we assume that 
the matrices for which F attains its extreme have the form 



Ga,b = g + {G- g)5a,i 



r + {R- r)6a,b- 



(A.7) 



This ansatz is correct for rj < and for > and a large enough [23]. The 
reason for this is that Hn is a non-negative definite quadratic form in A-'^. 
Hence it has a very simple energy landscape, characterized by a single valley. 
Taking the limit n — 0, Eq. (A. 3) then gives 



Ff^)(Q,g,i?,r) 



ag 



a _ 
+ — In 



a + 2l3{G-g) 2(3 



1 + 



2(3{G-g) 



a 



r]G 



+ ^ {RG - rg) - ^E^^Ti^^^V [-/^^^(v?)]} (A.8) 



where Tr^ is now the integral over the simplex A of a single agent's mixed 
strategies and we defined, for convenience, the potential V^iji) = -Joixz ■ 
7? — — r)|7rp. The parameters g, G, r and R are fixed by the first order 

conditions dF^^^ /dg = 0, dF^^^^ /dG = 0, dF^^^^^/dr = and dF^^^/dR = 



A generic distribution g^^ can also be handled, though with heavier notations. 
For simplicity we have also done the transformation Ta b — > rafi/2 for a 7^ 6 so 

that Y,a>b ~^ J2a,b- 

Note that, by Eq. (A. 2), we shall also be interested in the limit of /? ^ 00 in the 
end! 
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0. These equations, finally, have to be studied in the limit (3 — > oo, where one 
recovers the minimum of by Eq. (A. 2), i.e. 

hm min ^^^T^ = " i™ ik? i^^^i^)) = ]™ F^/^^\Q, Q, R, r) 



sp 



where the subscript sp means that we compute the function at the 

saddle point values of Q,q,R and r. 

It is convenient to define the parameters 



a l + 77(l + x) 



In the limit /? — > oo, we first look for solutions where g ^ G and x, which we 
call susceptibility, remains finite. This implies that two replicas of the same 
system converge in the long run to the same stationary state. Using the saddle 
point equations, and g — G, we can rewrite 



The last term in Eq. (A. 8) is dominated by the mixed strategy 'k*{z) which is 
the solution of 

7r*(z) = argminF^(7f). (A.ll) 



We find that G = g = Eg[7f*{z)], which is then a function of y only G = G{y). 
Upon defining ({y) = Eg[z ■ tt*{z)], we find 

xiv) ^ 



The second of Eqs. (A.9) becomes an equation for y as a. function of a which 
has two implicit solutions. These can be expressed as explicit solutions for a 
as a function of y and rj: 



1 



G-yC±^{G + yCr-^VyCG 
2{1 -rj)y 



(A.12) 
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A.l 77 <0 



The solutions of Eq. (A. 12), for 77 < describe the two branches a < ac and 
a > ac- In particular for — > 0" these solutions become 

ay' = G{y), aG{y) = C{y). (A.13) 



Let us discuss first the case 77 — > : The free energy per agent is 

hm — = - hm ^ = -^r (A.14) 



These equations are transcendental and we could not find an explicit solution 
for generic S. Nevertheless, they represent a great simplification with respect 
to the original problem. The main technical difficulty lies in the evaluation 
of the functions G{y) = £'i>[|7r*(f)|^] and ({y) = E^[z ■ 7r*(-?)], which can be 
computed numerically to any desired accuracy VS". 

The first of Eqs. (A.13) gives the a > ac phase. This solution has x > 
finite and H > non-zero. As a decreases x increases and it diverges as 
\a — ac\~^ when a — >■ a^. In this limit Eq. (A.14) implies that ~ jo; — ad"^ 
vanishes. The critical point ac = a{yc) is obtained imposing x = 00, which 
gives G{yc) = —ycCiUc)- By the numerical evaluation of the functions G{y) 
and C{y)j we find 



ac{S) ^ ac{2) + ^-^ (A.15) 



to a high degree of accuracy. It might be that this equation is exact but we 
could not prove it. An interesting relation for adS) can be derived by algebraic 
considerations: Note that for each TTs,i > the equation 

dH 

- — = 2j2'a7Ja7j'^s',j = (A. 16) 



must hold. This is a set of linear equations in the variables TTs^i > 0. The NS x 
NS matrix a^Ja^ is built with P dimensional vectors ^ and therefore has at 
most rank P. In other words there are only P independent equations (A. 16). 
In addition there are N normalization conditions on tt^ j. The system becomes 
dynamically degenerate when the number of free variables tt^^j becomes bigger 
than the number P -\- N oi independent equations and, exactly at ac the two 
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are equal. Dividing this condition by gives the desired equation 

j2E,{9[n:m = cyc{S) + l. (A.17) 

s=l 

The left hand side is the average number of strategies used by agents (called 
n in section 8). Note that this equation implies that adS) cannot grow faster 
than hnear in S. Also a^S) oc S/2 imply that agents use on average 1/2 of 
their strategies at ac- 

The second of Eqs. (A. 13) gives the a < ac phase. Note indeed that with this 
choice X — ^ ^ ^ind H rj'^ —>■ a.s rj 0^ . At odds with the solution 

for a > ac, this equation only arises if rj < and in the limit rj —>■ 0^ . With 
1] = the saddle point equations have only a solution with G > g in the limit 
/? — oo. This is because for a < ac the set Ai where if = is not a single 
point, but rather a connected set. The replica method with rj = takes an 
average on all the set Ai and so it gives results which are not representative of 
a particular systemp^. In order to select a single point in A4 one may consider 
the limit ?7 ^ 0^. Since the term —t]NG in breaks the degeneracy of 
equilibria for rj = 0, the limit rj 0~ selects the equilibrium which is closest 
to the random initial condition TTs^i{0) = 1/S for alH G A/" and s = 1, . . . , S . 
This describes the stationary state of a system of agents with no prior beliefs 
(f/s,i(t = 0) = 0, Vs,z). 

In both phases, once the saddle point equations are solved, one can derive 
the full statistical characterization of the system. For instance the fraction of 
agents playing a strategies in a neighborhood dn of vf is given by p{T[)dTi = 
Es[5{n*{z) -n)]dTr. 



A. 2 ri>{) 



Let us for simplicity consider the simpler case 5 = 2. The solution with G = g 
exists for a > l/vr. For a > [7r(l — 77)^]"^ this solution has G = g < 1, which 
means that agents do not all play pure strategies. When a [n^l — ?7)^]~\ 
G — s> 1 and the solution becomes independent of 77. In other words, the solution 
merges with the solution for 77 = 1. In its turn this solution breaks down, 
with X — i> 00 and Hi/N = a'^/N when a — >■ l/vr. Below this point, only 
solutions with G < g and Hi/N = exist. This behavior is well documented in 
figure 4. However, for rj > 0, one needs to go beyond the simple approximation 
for Ga,b and Va^b in Eq- (A. 7). Therefore we shall refrain from a more detailed 

Note indeed that g has the interpretation of the overlap between two replicas of 
the same system, so that g < G means that the two rephcas are not identical. 
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discussion and rather refer the interested reader to a forthcoming pubhcation 
[23]. 
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